NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CALM: Multimodal Cognitive Load Assessment Framework via Engineered and Explainable Features

https://doi.org/10.1109/PerComWorkshops65533.2025.00067

Myrick, Cahree; Ghosh, Indrajeet; Jayarajah, Kasthuri; Roy, Nirmalya (March 2025, IEEE)

Free, publicly-accessible full text available March 17, 2026
CoOpTex: Multimodal Cooperative Perception and Task Execution in Time-Critical Distributed Autonomous System

https://doi.org/10.1109/DCOSS-IoT65416.2025.00033

Anwar, Mohammad Saeid; Ravi, Anuradha; Dey, Emon; Shinde, Gaurav; Ghosh, Indrajeet; Freeman, Jade; Busart, Carl; Harrison, André; Roy, Nirmalya (June 2025, IEEE)

Integrating multimodal data such as RGB and LiDAR from multiple views significantly increases computational and communication demands, which can be challenging for resource-constrained autonomous agents while meeting the time-critical deadlines required for various mission-critical applications. To address this challenge, we propose CoOpTex, a collaborative task execution framework designed for cooperative perception in distributed autonomous systems (DAS). CoOpTex contribution is twofold: (a) CoOpTex fuses multiview RGB images to create a panoramic camera view for 2D object detection and utilizes 360° LiDAR for 3D object detection, improving accuracy with a lightweight Graph Neural Network (GNN) that integrates object coordinates from both perspectives, (b) To optimize task execution and meet the deadline, CoOpTex dynamically offloads computationally intensive image stitching tasks to auxiliary devices when available and adjusts frame capture rates for RGB frames based on device mobility and processing capabilities. We implement CoOpTex in real-time on static and mobile heterogeneous autonomous agents, which helps to significantly reduce deadline violations by 100% while improving frame rates for 2D detection by 2.2 times in stationary and 2 times in mobile conditions, demonstrating its effectiveness in enabling real-time cooperative perception.
more » « less
Free, publicly-accessible full text available June 9, 2026
Unsupervised Domain Adaptation for Action Recognition via Self-Ensembling and Conditional Embedding Alignment

https://doi.org/10.1109/ICDM59182.2024.00079

Ghosh, Indrajeet; Chugh, Garvit; Md_Faridee, Abu Zaher; Roy, Nirmalya (December 2024, IEEE)

Recent advancements in deep learning-based wearable human action recognition (wHAR) have improved the capture and classification of complex motions, but adoption remains limited due to the lack of expert annotations and domain discrepancies from user variations. Limited annotations hinder the model's ability to generalize to out-of-distribution samples. While data augmentation can improve generalizability, unsupervised augmentation techniques must be applied carefully to avoid introducing noise. Unsupervised domain adaptation (UDA) addresses domain discrepancies by aligning conditional distributions with labeled target samples, but vanilla pseudo-labeling can lead to error propagation. To address these challenges, we propose μDAR, a novel joint optimization architecture comprised of three functions: (i) consistency regularizer between augmented samples to improve model classification generalizability, (ii) temporal ensemble for robust pseudo-label generation and (iii) conditional distribution alignment to improve domain generalizability. The temporal ensemble works by aggregating predictions from past epochs to smooth out noisy pseudo-label predictions, which are then used in the conditional distribution alignment module to minimize kernel-based class-wise conditional maximum mean discrepancy (kCMMD) between the source and target feature space to learn a domain invariant embedding. The consistency-regularized augmentations ensure that multiple augmentations of the same sample share the same labels; this results in (a) strong generalization with limited source domain samples and (b) consistent pseudo-label generation in target samples. The novel integration of these three modules in μDAR results in a range of ~ 4-12% average macro-F1 score improvement over six state-of-the-art UDA methods in four benchmark wHAR datasets.
more » « less
Full Text Available
HeteroSys: Heterogeneous and Collaborative Sensing in the Wild

https://doi.org/10.1109/SMARTCOMP58114.2023.00073

Ghosh, Indrajeet; Goldstein, Adam; Chakma, Avijoy; Freeman, Jade; Gregory, Timothy; Suri, Niranjan; Ramamurthy, Sreenivasan Ramasamy; Roy, Nirmalya (June 2023, IEEE)
STAR-Lite: A light-weight scalable self-taught learning framework for older adults’ activity recognition

https://doi.org/10.1016/j.pmcj.2022.101698

Ramasamy Ramamurthy, Sreenivasan; Ghosh, Indrajeet; Gangopadhyay, Aryya; Galik, Elizabeth; Roy, Nirmalya (December 2022, Pervasive and Mobile Computing)

Full Text Available
SpecTextor: End-to-End Attention-based Mechanism for Dense Text Generation in Sports Journalism

https://doi.org/10.1109/SMARTCOMP55677.2022.00081

Ghosh, Indrajeet; Ivler, Matthew; Ramamurthy, Sreenivasan Ramasamy; Roy, Nirmalya (June 2022, 2022 IEEE International Conference on Smart Computing (SMARTCOMP))

Language-guided smart systems can help to design next-generation human-machine interactive applications. The dense text description is one of the research areas where systems learn the semantic knowledge and visual features of each video frame and map them to describe the video's most relevant subjects and events. In this paper, we consider untrimmed sports videos as our case study. Generating dense descriptions in the sports domain to supplement journalistic works without relying on commentators and experts requires more investigation. Motivated by this, we propose an end-to-end automated text-generator, SpecTextor, that learns the semantic features from untrimmed videos of sports games and generates associated descriptive texts. The proposed approach considers the video as a sequence of frames and sequentially generates words. After splitting videos into frames, we use a pre-trained VGG-16 model for feature extraction and encoding the video frames. With these encoded frames, we posit a Long Short-Term Memory (LSTM) based attention-decoder pipeline that leverages soft-attention mechanism to map the semantic features with relevant textual descriptions to generate the explanation of the game. Because developing a comprehensive description of the game warrants training on a set of dense time-stamped captions, we leverage two available public datasets: ActivityNet Captions and Microsoft Video Description. In addition, we utilized two different decoding algorithms: beam search and greedy search and computed two evaluation metrics: BLEU and METEOR scores.
more » « less
Full Text Available
STAR: A Scalable Self-taught Learning Framework for Older Adults’ Activity Recognition

https://doi.org/10.1109/SMARTCOMP52413.2021.00037

Ramamurthy, Sreenivasan Ramasamy; Ghosh, Indrajeet; Gangopadhyay, Aryya; Galik, Elizabeth; Roy, Nirmalya (August 2021, 2021 IEEE International Conference on Smart Computing (SMARTCOMP))

Full Text Available

Search for: All records